717 research outputs found

    Finding Maximal Exact Matches in Graphs

    Get PDF

    Indexable Elastic Founder Graphs of Minimum Height

    Get PDF
    Indexable elastic founder graphs have been recently proposed as a data structure for genomics applications supporting fast pattern matching queries. Consider segmenting a multiple sequence alignment MSA[1..m,1..n] into b blocks MSA[1..m,1..j₁], MSA[1..m,j₁+1..j₂], …, MSA[1..m,j_{b-1}+1..n]. The resulting elastic founder graph (EFG) is obtained by merging in each block the strings that are equivalent after the removal of gap symbols, taking the strings as the nodes of the block and the original MSA connections as edges. We call an elastic founder graph indexable if a node label occurs as a prefix of only those paths that start from a node of the same block. Equi et al. (ISAAC 2021) showed that such EFGs support fast pattern matching and studied their construction maximizing the number of blocks and minimizing the maximum length of a block, but left open the case of minimizing the maximum number of distinct strings in a block that we call graph height. For the simplified gapless setting, we give an O(mn) time algorithm to find a segmentation of an MSA minimizing the height of the resulting indexable founder graph, by combining previous results in segmentation algorithms and founder graphs. For the general setting, the known techniques yield a linear-time parameterized solution on constant alphabet Σ, taking time O(m n² log|Σ|) in the worst case, so we study the refined measure of prefix-aware height, that omits counting strings that are prefixes of another considered string. The indexable EFG minimizing the maximum prefix-aware height provides a lower bound for the original height: by exploiting exploiting suffix trees built from the MSA rows and the data structure answering weighted ancestor queries in constant time of Belazzougui et al. (CPM 2021), we give an O(mn)-time algorithm for the optimal EFG under this alternative height.Peer reviewe

    3coSoKu and its declarative modeling

    Get PDF
    In this paper, we analyze the physical puzzle IcoSoKu, a game about placing some given triangular tiles on the faces of an icosahedron in order to fill the capacities of its vertices, and we propose its generalization called 3coSoKu, admitting an arbitrary playing field with triangular faces, arbitrary capacities and an arbitrary set of triangular tiles. First, we prove the strong NP-completeness of 3coSoKu, even when the playing field is a convex polyhedron with equilateral triangles as faces. Second, we encode 3coSoKu both in the constraint modeling language MiniZinc and in the logic programming paradigm known as Answer Set Programming and we develop a visual tool for an accessible interface to the solver. Finally, we use our encodings to verify experimentally that every initial state for IcoSoKu admits a solution.Peer reviewe

    Linear Time Construction of Indexable Elastic Founder Graphs

    Get PDF
    The pattern matching of strings in labeled graphs has been widely studied lately due to its importance in genomics applications. Unfortunately, even the simplest problem of deciding if a string appears as a subpath of a graph admits a quadratic lower bound under the Orthogonal Vectors Hypothesis (Equi et al. ICALP 2019, SOFSEM 2021). To avoid this bottleneck, the research has shifted towards more specific graph classes, e.g. those induced from multiple sequence alignments (MSAs). Consider segmenting MSA[1..m, 1..n] into b blocks MSA[1..m, 1..j1], MSA[1..m, j1 + 1..j2],..., MSA[1..m, jb- 1 + 1..n]. The distinct strings in the rows of the blocks, after the removal of gap symbols, form the nodes of an elastic founder graph (EFG) where the edges represent the original connections observed in the MSA. An EFG is called indexable if a node label occurs as a prefix of only those paths that start from a node of the same block. Equi et al. (ISAAC 2021) showed that such EFGs support fast pattern matching and gave an O(mnlogm)-time algorithm for preprocessing the MSA in a way that allows the construction of indexable EFGs maximizing the number of blocks and, alternatively, minimizing the maximum length of a block, in O(n) and O(n log log n) time respectively. Using the suffix tree and solving a novel ancestor problem on trees, we improve the preprocessing to O(mn) time and the O(n log log n)-time EFG construction to O(n) time, thus showing that both types of indexable EFGs can be constructed in time linear in the input size.Peer reviewe

    An unsupervised approach to disjointness learning based on terminological cluster trees

    Get PDF
    In the context of the Semantic Web regarded as a Web of Data, research efforts have been devoted to improving the quality of the ontologies that are used as vocabularies to enable complex services based on automated reasoning. From various surveys it emerges that many domains would require better ontologies that include non-negligible constraints for properly conveying the intended semantics. In this respect, disjointness axioms are representative of this general problem: these axioms are essential for making the negative knowledge about the domain of interest explicit yet they are often overlooked during the modeling process (thus affecting the efficacy of the reasoning services). To tackle this problem, automated methods for discovering these axioms can be used as a tool for supporting knowledge engineers in modeling new ontologies or evolving existing ones. The current solutions, either based on statistical correlations or relying on external corpora, often do not fully exploit the terminology. Stemming from this consideration, we have been investigating on alternative methods to elicit disjointness axioms from existing ontologies based on the induction of terminological cluster trees, which are logic trees in which each node stands for a cluster of individuals which emerges as a sub-concept. The growth of such trees relies on a divide-and-conquer procedure that assigns, for the cluster representing the root node, one of the concept descriptions generated via a refinement operator and selected according to a heuristic based on the minimization of the risk of overlap between the candidate sub-clusters (quantified in terms of the distance between two prototypical individuals). Preliminary works have showed some shortcomings that are tackled in this paper. To tackle the task of disjointness axioms discovery we have extended the terminological cluster tree induction framework with various contributions: 1) the adoption of different distance measures for clustering the individuals of a knowledge base; 2) the adoption of different heuristics for selecting the most promising concept descriptions; 3) a modified version of the refinement operator to prevent the introduction of inconsistency during the elicitation of the new axioms. A wide empirical evaluation showed the feasibility of the proposed extensions and the improvement with respect to alternative approaches

    Solving String Problems on Graphs Using the Labeled Direct Product

    Get PDF
    Suffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, either by using efficient ad-hoc approaches which do not generalize to all input graphs, or by indexing difficult graphs and having worst-case exponential complexities. In the absence of an ubiquitous and polynomial tool like the suffix tree for labeled graphs, we introduce the labeled direct product of two graphs as a general tool for obtaining optimal algorithms in the worst case: we obtain conceptually simpler algorithms for the quadratic problems of string matching (SMLG) and longest common substring (LCSP) in labeled graphs. Our algorithms run in time linear in the size of the labeled product graph, which may be smaller than quadratic for some inputs, and their run-time is predictable, because the size of the labeled direct product graph can be precomputed efficiently. We also solve LCSP on graphs containing cycles, which was left as an open problem by Shimohira et al. in 2011. To show the power of the labeled product graph, we also apply it to solve the matching statistics (MSP) and the longest repeated string (LRSP) problems in labeled graphs. Moreover, we show that our (worst-case quadratic) algorithms are also optimal, conditioned on the Orthogonal Vectors Hypothesis. Finally, we complete the complexity picture around LRSP by studying it on undirected graphs.Peer reviewe

    GPR investigations for the study and the restoration of the rose window of Troia Cathedral (southern Italy)

    Get PDF
    The development of cracks and distortions caused by past seismic events compromised the integrity of the rose window of Troia Cathedral, one of the most precious Romanesque monuments in southern Italy. Ground-penetrating radar (GPR) using high-frequency antennae (mainly 1500 MHz) was selected from among various non-destructive testing methods for its high-resolution imaging to scan the internal structure of the various architectural elements of the wheel window: the decimetre-diameter columns constituting the rays, the ring decorated with intersecting arched ribwork and the surrounding circular ashlar curb. GPR was employed in the classical continuous reflection mode, moving the antennae manually along the architectural elements and paying exceptional care in the acquisition and processing stages to avoid positioning errors. Indeed, the challenging aspects of this case study were the geometrical complexity and small dimensions of the structural elements, causing many logistic/coupling problems. In spite of this, through proper interpretation techniques, based on signal analysis (presence of reflections and diffractions, velocity and attenuation variations) and correlation with features detected by visual inspection of the external surfaces, the GPR survey provided useful information on the internal structure of the rose window, detecting fractures and the boundaries of previously restored parts and locating hidden metallic components connecting the architectural elements. Information on the internal structure and spatial distribution of metallic junctions was essential for gaining insight into building techniques in order to discriminate between restoration strategies which may require either total or partial dismantling of the rose window. GPR results provided crucial evidence in favour of one of the (conflicting) hypotheses about the original building techniques, leading to the selection of partial dismantling as the most suitable restoration strategy. Analysis of measurements revealed the potential of GPR in the field of cultural heritage restoration, even in those cases characterized by complex geometry, structural brittleness and logistic difficulties, such as that discussed in this paper

    Chaining of Maximal Exact Matches in Graphs

    Full text link
    We study the problem of finding maximal exact matches (MEMs) between a query string QQ and a labeled directed acyclic graph (DAG) G=(V,E,)G=(V,E,\ell) and subsequently co-linearly chaining these matches. We show that it suffices to compute MEMs between node labels and QQ (node MEMs) to encode full MEMs. Node MEMs can be computed in linear time and we show how to co-linearly chain them to solve the Longest Common Subsequence (LCS) problem between QQ and GG. Our chaining algorithm is the first to consider a symmetric formulation of the chaining problem in graphs and runs in O(k2V+E+kNlogN)O(k^2|V| + |E| + kN\log N) time, where kk is the width (minimum number of paths covering the nodes) of GG, and NN is the number of node MEMs. We then consider the problem of finding MEMs when the input graph is an indexable elastic founder graph (subclass of labeled DAGs studied by Equi et al., Algorithmica 2022). For arbitrary input graphs, the problem cannot be solved in truly sub-quadratic time under SETH (Equi et al., ICALP 2019). We show that we can report all MEMs between QQ and an indexable elastic founder graph in time O(nH2+m+Mκ)O(nH^2 + m + M_\kappa), where nn is the total length of node labels, HH is the maximum number of nodes in a block of the graph, m=Qm = |Q|, and MκM_\kappa is the number of MEMs of length at least κ\kappa. The results extend to the indexing problem, where the graph is preprocessed and a set of queries is processed as a batch.Comment: 19 pages, 1 figur
    corecore